Indexed Map-Reduce Join Algorithm
نویسنده
چکیده
Map Reduce is used to handle and support massive data sets .rabidly increasing in data size, and big data are imperative today to make an analysis of this data. Map-Reduce gets more helpful information by using two simple functions map and reduce with load balancing, fault tolerance, and high scalability .the most important operation in the analysis process is join. This paper explains new two-way join algorithm called Indexed Map Reduce Join Algorithm that used Index in the large table to Decrease I/O and Shuffling that cause Best performance in Map Reduce Join. Our experimental result shows that using Index-join algorithm has high performance than other algorithms while increasing the data size from 100 million records to 500 million without memory overflow.
منابع مشابه
Implementation and Analysis of Join Algorithms to handle skew for the Hadoop Map/Reduce Framework
he Map/Reduce framework-a parallel processing paradigm-is widely being used for large scale distributed data processing. Map/Reduce can perform typical relational database operations like selection, aggregation, and projection etc. However, binary relational operators like join, cartesian product, and set operations are difficult to implement with Map/Reduce. Map/Reduce can process homogeneous ...
متن کاملRuntime Optimization of Join Location in Parallel Data Management Systems
Applications running on parallel systems often need to join a streaming relation or a stored relation with data indexed in a parallel data storage system. Some applications also compute UDFs on the joined tuples. The join can be done at the data storage nodes, corresponding to reduce side joins, or by fetching data from the storage system to compute nodes, corresponding to map side join. Both m...
متن کاملA Comparative Analysis of Join Algorithms Using the Hadoop Map/Reduce Framework
The Map/Reduce framework is a programming model recently introduced by Google Inc. to support distributed computing on very large datasets across a large number of machines. It provides a simple but yet powerful way to implement distributed applications without having deeper knowledge of parallel programming. Each participating node executes Map and/or Reduce tasks which involve reading and wri...
متن کاملA Scalable and Skew-insensitive Algorithm for Join Operations using Map/Reduce Model
For over a decade, Map/Reduce has become a prominent programming model to handle vast amounts of raw data in large scale systems. This model ensures scalability, reliability and availability aspects with reasonable query processing time. However these large scale systems still face some challenges : data skew, task imbalance, high disk i/o and redistribution costs can have disastrous effects on...
متن کاملA Unified Approach for Indexed and Non-Indexed Spatial Joins
Most spatial join algorithms either assume the existence of a spatial index structure that is traversed during the join process,or solve the problem by sorting, partitioning, or on-the-fly index construction. In this paper, we develop a simple plane-sweeping algorithm that unifies the index-based and non-index based approaches. This algorithm processes indexed as well as non-indexed inputs, ext...
متن کامل